Using generic implementation for 16-bit activations and 8-bit weights for matmul in backends #16008

RahulC7 · 2025-11-28T21:00:35Z

Summary:

Context

We continue from D84284794 to add support for 16-bit activations. Note that right now, all though they support 16-bit activations already, it's only if the weights are also 16-bits. To do this, we need to change the way we template some functions.

Current Behavior

Right now, we're composing two macros together, the ET_FORALL_JARVIS_QUANTIZED_TYPES_WITH_INT16 macro:

https://www.internalfb.com/code/fbsource/[9e8c6d8466107f58aa3de1b9e4ec71c49d670a8f]/fbcode/on_device_ai/Assistant/Jarvis/min_runtime/operators/generic/operators.h?lines=22-25

and the function macro(quantized_linear chosen for example):

https://www.internalfb.com/code/fbsource/[9e8c6d8466107f58aa3de1b9e4ec71c49d670a8f]/fbcode/on_device_ai/Assistant/Jarvis/min_runtime/operators/generic/quantized_linear_out.cpp?lines=30-41

so together, it just becomes a switch statement, calling the quantized_linear function with the correct template parameter.

However, note that it assumes that both the input activations and weights are the same dtype, which is not the case.

This Diff

We now use the generic implementation in the backends and create a unit test as well as e2e tests.

Differential Revision: D87997149

… for linear in backends Summary: # Context We continue from D84284794 to add support for 16-bit activations. Note that right now, all though they support 16-bit activations already, it's only if the weights are also 16-bits. To do this, we need to change the way we template some functions. # Current Behavior Right now, we're composing two macros together, the `ET_FORALL_JARVIS_QUANTIZED_TYPES_WITH_INT16` macro: https://www.internalfb.com/code/fbsource/[9e8c6d8466107f58aa3de1b9e4ec71c49d670a8f]/fbcode/on_device_ai/Assistant/Jarvis/min_runtime/operators/generic/operators.h?lines=22-25 and the function macro(`quantized_linear` chosen for example): https://www.internalfb.com/code/fbsource/[9e8c6d8466107f58aa3de1b9e4ec71c49d670a8f]/fbcode/on_device_ai/Assistant/Jarvis/min_runtime/operators/generic/quantized_linear_out.cpp?lines=30-41 so together, it just becomes a switch statement, calling the `quantized_linear` function with the correct template parameter. However, note that it assumes that both the input activations and weights are the same dtype, which is not the case. # This Diff We finish by using the generic implementation for all the backends and adding e2e tests as well as unit tests. Reviewed By: hsharma35 Differential Revision: D87946776

…for Conv2D in Backends (pytorch#16007) Summary: # Context We continue from D84284794 to add support for 16-bit activations. Note that right now, all though they support 16-bit activations already, it's only if the weights are also 16-bits. To do this, we need to change the way we template some functions. # Current Behavior Right now, we're composing two macros together, the `ET_FORALL_JARVIS_QUANTIZED_TYPES_WITH_INT16` macro: https://www.internalfb.com/code/fbsource/[9e8c6d8466107f58aa3de1b9e4ec71c49d670a8f]/fbcode/on_device_ai/Assistant/Jarvis/min_runtime/operators/generic/operators.h?lines=22-25 and the function macro(`quantized_linear` chosen for example): https://www.internalfb.com/code/fbsource/[9e8c6d8466107f58aa3de1b9e4ec71c49d670a8f]/fbcode/on_device_ai/Assistant/Jarvis/min_runtime/operators/generic/quantized_linear_out.cpp?lines=30-41 so together, it just becomes a switch statement, calling the `quantized_linear` function with the correct template parameter. However, note that it assumes that both the input activations and weights are the same dtype, which is not the case. # This Diff We finish by using the generic implementation for all the backends and adding e2e tests as well as unit tests. Differential Revision: D87993325

… for matmul in backends Summary: # Context We continue from D84284794 to add support for 16-bit activations. Note that right now, all though they support 16-bit activations already, it's only if the weights are also 16-bits. To do this, we need to change the way we template some functions. # Current Behavior Right now, we're composing two macros together, the `ET_FORALL_JARVIS_QUANTIZED_TYPES_WITH_INT16` macro: https://www.internalfb.com/code/fbsource/[9e8c6d8466107f58aa3de1b9e4ec71c49d670a8f]/fbcode/on_device_ai/Assistant/Jarvis/min_runtime/operators/generic/operators.h?lines=22-25 and the function macro(`quantized_linear` chosen for example): https://www.internalfb.com/code/fbsource/[9e8c6d8466107f58aa3de1b9e4ec71c49d670a8f]/fbcode/on_device_ai/Assistant/Jarvis/min_runtime/operators/generic/quantized_linear_out.cpp?lines=30-41 so together, it just becomes a switch statement, calling the `quantized_linear` function with the correct template parameter. However, note that it assumes that both the input activations and weights are the same dtype, which is not the case. # This Diff We now use the generic implementation in the backends and create a unit test as well as e2e tests. Differential Revision: D87997149

pytorch-bot · 2025-11-28T21:00:39Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16008

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 2 Active SEVs

There are 2 currently active SEVs. If your PR is affected, please view them below:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2025-11-28T21:00:43Z

@RahulC7 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D87997149.

Copilot

Pull request overview

This PR adds support for 16-bit activations with 8-bit weights (W8A16) for quantized operations in the Cadence HiFi backend by delegating to generic implementations when the heterogeneous type combination is detected.

Key Changes:

Modified backend operators (matmul, linear, conv2d) to detect and handle int16 activation + int8 weight combinations
Added build dependencies on generic operator implementations for W8A16 support
Created comprehensive unit tests for the new W8A16 functionality
Added new quantizer classes for 16-bit activation support in conv and matmul operations

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
backends/cadence/hifi/operators/op_quantized_matmul_out.cpp	Added W8A16 type check and generic implementation call for int16 activations
backends/cadence/hifi/operators/op_quantized_linear_out.cpp	Added W8A16 type checks for both per-channel and per-tensor variants
backends/cadence/hifi/operators/op_quantized_conv2d_nchw_out.cpp	Added W8A16 handling for NCHW format convolutions
backends/cadence/hifi/operators/op_quantized_conv2d_nhwc_out.cpp	Added W8A16 handling for NHWC format convolutions
backends/cadence/hifi/operators/targets.bzl	Updated build configuration with generic operator dependencies and removed redundant operator entries
backends/cadence/hifi/operators/op_quantized_matmul_out.h	Added public header for quantized_matmul_out function declaration
backends/cadence/hifi/operators/tests/test_op_quantized_matmul_out.cpp	Added unit tests for W8A16 matmul with transposed and non-transposed cases
backends/cadence/hifi/operators/tests/test_op_quantized_linear_out.cpp	Added unit test for W8A16 linear operation
backends/cadence/hifi/operators/tests/test_op_quantized_conv2d_out.cpp	Added unit tests for W8A16 conv2d in both NCHW and NHWC formats
backends/cadence/aot/quantizer/quantizer.py	Added quantizer classes for 16-bit activation support in conv and matmul patterns

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-28T21:01:47Z

backends/cadence/hifi/operators/op_quantized_linear_out.cpp

+  }
+  else if (out.scalar_type() == executorch::aten::ScalarType::Byte) {


Missing space after closing brace on line 238. The 'else' should be on the same line as the closing brace of the preceding if block, or there should be consistent formatting with other similar conditionals in the file.

Suggested change

}

else if (out.scalar_type() == executorch::aten::ScalarType::Byte) {

} else if (out.scalar_type() == executorch::aten::ScalarType::Byte) {

Copilot · 2025-11-28T21:01:47Z

backends/cadence/hifi/operators/op_quantized_linear_out.cpp

+  }
+  else if (out.scalar_type() == executorch::aten::ScalarType::Byte) {


Missing space after closing brace on line 296. The 'else' should be on the same line as the closing brace of the preceding if block, or there should be consistent formatting with other similar conditionals in the file.

Suggested change

}

else if (out.scalar_type() == executorch::aten::ScalarType::Byte) {

} else if (out.scalar_type() == executorch::aten::ScalarType::Byte) {

Copilot · 2025-11-28T21:01:48Z

backends/cadence/hifi/operators/tests/test_op_quantized_conv2d_out.cpp

+  // With all ones input and weights, and kernel size 3x5=15 * 8 channels = 120
+  // After applying out_multiplier (0.5 * 2^31), the value is scaled by 0.5
+  // Expected value: 120 * 0.5 = 60
+  Tensor expected = tf_int16.full({1, 16, 18, 24}, 120);


The expected output value (120) doesn't match the comment's calculation (120 * 0.5 = 60). Based on the comment logic and the out_multiplier value, the expected tensor should contain 60, not 120.

Suggested change

Tensor expected = tf_int16.full({1, 16, 18, 24}, 120);

Tensor expected = tf_int16.full({1, 16, 18, 24}, 60);

Copilot · 2025-11-28T21:01:48Z

backends/cadence/hifi/operators/tests/test_op_quantized_conv2d_out.cpp

+  // With all ones input and weights, and kernel size 3x5=15 * 8 channels = 120
+  // After applying out_multiplier (0.5 * 2^31), the value is scaled by 0.5
+  // Expected value: 120 * 0.5 = 60
+  Tensor expected = tf_int16.full({1, 18, 24, 16}, 120);


The expected output value (120) doesn't match the comment's calculation (120 * 0.5 = 60). Based on the comment logic and the out_multiplier value, the expected tensor should contain 60, not 120.

Suggested change

Tensor expected = tf_int16.full({1, 18, 24, 16}, 120);

Tensor expected = tf_int16.full({1, 18, 24, 16}, 60);

Copilot · 2025-11-28T21:01:48Z

backends/cadence/hifi/operators/op_quantized_conv2d_nhwc_out.cpp

+    return;
+  }

+  bool optimized = 0;


Variable declaration for 'optimized' appears after the early return, but it was previously at the beginning of the function. This creates inconsistent code structure and the variable is now unreachable when the W8A16 path is taken. Consider moving this declaration back before the W8A16 conditional check for consistency with the original code structure.

Copilot · 2025-11-28T21:01:49Z

backends/cadence/hifi/operators/tests/test_op_quantized_matmul_out.cpp

+  // Using simple values for testing
+  Tensor X = tf_int16.ones({64, 33});
+  Tensor Y = tf_int8.ones({33, 128});
+  // Bias not used


Comment states 'Bias not used' but a bias tensor is created and passed to the function. Either remove the misleading comment or clarify that while bias is passed, it may not affect the expected output in this test case.

Suggested change

// Bias not used

// Bias is passed but does not affect the expected output in this test case

Copilot · 2025-11-28T21:01:49Z

backends/cadence/hifi/operators/tests/test_op_quantized_matmul_out.cpp

+  // Bias not used
+  Tensor bias = tf_int32.full({128}, -30);


Comment states 'Bias not used' but a bias tensor is created and passed to the function. Either remove the misleading comment or clarify that while bias is passed, it may not affect the expected output in this test case.

RahulC7 added 3 commits November 28, 2025 13:00

Copilot AI review requested due to automatic review settings November 28, 2025 21:00

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 28, 2025

meta-codesync bot added fb-exported meta-exported labels Nov 28, 2025

Copilot AI reviewed Nov 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using generic implementation for 16-bit activations and 8-bit weights for matmul in backends #16008

Using generic implementation for 16-bit activations and 8-bit weights for matmul in backends #16008

Uh oh!

RahulC7 commented Nov 28, 2025

Uh oh!

pytorch-bot bot commented Nov 28, 2025

Uh oh!

meta-codesync bot commented Nov 28, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 28, 2025

Uh oh!

Copilot AI Nov 28, 2025

Uh oh!

Copilot AI Nov 28, 2025

Uh oh!

Copilot AI Nov 28, 2025

Uh oh!

Copilot AI Nov 28, 2025

Uh oh!

Copilot AI Nov 28, 2025

Uh oh!

Copilot AI Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		}
		else if (out.scalar_type() == executorch::aten::ScalarType::Byte) {

	}
	else if (out.scalar_type() == executorch::aten::ScalarType::Byte) {
	} else if (out.scalar_type() == executorch::aten::ScalarType::Byte) {

	Tensor expected = tf_int16.full({1, 16, 18, 24}, 120);
	Tensor expected = tf_int16.full({1, 16, 18, 24}, 60);

	Tensor expected = tf_int16.full({1, 18, 24, 16}, 120);
	Tensor expected = tf_int16.full({1, 18, 24, 16}, 60);

	// Bias not used
	// Bias is passed but does not affect the expected output in this test case

Using generic implementation for 16-bit activations and 8-bit weights for matmul in backends #16008

Are you sure you want to change the base?

Using generic implementation for 16-bit activations and 8-bit weights for matmul in backends #16008

Uh oh!

Conversation

RahulC7 commented Nov 28, 2025

Context

Current Behavior

This Diff

Uh oh!

pytorch-bot bot commented Nov 28, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16008

❗ 2 Active SEVs

Uh oh!

meta-codesync bot commented Nov 28, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant